AITopics | music signal

Collaborating Authors

music signal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unsupervised vocal dereverberation with diffusion-based generative models

Saito, Koichi, Murata, Naoki, Uesaka, Toshimitsu, Lai, Chieh-Hsin, Takida, Yuhta, Fukui, Takao, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceNov-8-2022

Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations. Reverberation of music contains two categories, natural reverb, and artificial reverb. Artificial reverb has a wider diversity than natural reverb due to its various parameter setups and reverberation types. However, recent supervised dereverberation methods may fail because they rely on sufficiently diverse and numerous pairs of reverberant observations and retrieved data for training in order to be generalizable to unseen observations during inference. To resolve these problems, we propose an unsupervised method that can remove a general kind of artificial reverb for music without requiring pairs of data for training. The proposed method is based on diffusion models, where it initializes the unknown reverberation operator with a conventional signal processing technique and simultaneously refines the estimate with the help of diffusion models. We show through objective and perceptual evaluations that our method outperforms the current leading vocal dereverberation benchmarks.

diffusion model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2211.04124

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.41)

Add feedback

Perception-Aware Attack: Creating Adversarial Music via Reverse-Engineering Human Perception

Duan, Rui, Qu, Zhe, Zhao, Shangqing, Ding, Leah, Liu, Yao, Lu, Zhuo

arXiv.org Artificial IntelligenceJul-26-2022

Recently, adversarial machine learning attacks have posed serious security threats against practical audio signal classification systems, including speech recognition, speaker recognition, and music copyright detection. Previous studies have mainly focused on ensuring the effectiveness of attacking an audio signal classifier via creating a small noise-like perturbation on the original signal. It is still unclear if an attacker is able to create audio signal perturbations that can be well perceived by human beings in addition to its attack effectiveness. This is particularly important for music signals as they are carefully crafted with human-enjoyable audio characteristics. In this work, we formulate the adversarial attack against music signals as a new perception-aware attack framework, which integrates human study into adversarial attack design. Specifically, we conduct a human study to quantify the human perception with respect to a change of a music signal. We invite human participants to rate their perceived deviation based on pairs of original and perturbed music signals, and reverse-engineer the human perception process by regression analysis to predict the human-perceived deviation given a perturbed signal. The perception-aware attack is then formulated as an optimization problem that finds an optimal perturbation signal to minimize the prediction of perceived deviation from the regressed human perception model. We use the perception-aware framework to design a realistic adversarial music attack against YouTube's copyright detector. Experiments show that the perception-aware attack produces adversarial music with significantly better perceptual quality than prior work.

artificial intelligence, machine learning, perturbation, (20 more...)

arXiv.org Artificial Intelligence

2207.13192

Country:

North America > United States > Florida > Hillsborough County > Tampa (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)
North America > United States > District of Columbia > Washington (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

MS-SincResNet: Joint learning of 1D and 2D kernels using multi-scale SincNet and ResNet for music genre classification

Chang, Pei-Chun, Chen, Yong-Sheng, Lee, Chang-Hsing

arXiv.org Artificial IntelligenceSep-18-2021

In this study, we proposed a new end-to-end convolutional neural network, called MS-SincResNet, for music genre classification. MS-SincResNet appends 1D multi-scale SincNet (MS-SincNet) to 2D ResNet as the first convolutional layer in an attempt to jointly learn 1D kernels and 2D kernels during the training stage. First, an input music signal is divided into a number of fixed-duration (3 seconds in this study) music clips, and the raw waveform of each music clip is fed into 1D MS-SincNet filter learning module to obtain three-channel 2D representations. The learned representations carry rich timbral, harmonic, and percussive characteristics comparing with spectrograms, harmonic spectrograms, percussive spectrograms and Mel-spectrograms. ResNet is then used to extract discriminative embeddings from these 2D representations. The spatial pyramid pooling (SPP) module is further used to enhance the feature discriminability, in terms of both time and frequency aspects, to obtain the classification label of each music clip. Finally, the voting strategy is applied to summarize the classification results from all 3-second music clips. In our experimental results, we demonstrate that the proposed MS-SincResNet outperforms the baseline SincNet and many well-known hand-crafted features. Considering individual 2D representation, MS-SincResNet also yields competitive results with the state-of-the-art methods on the GTZAN dataset and the ISMIR2004 dataset. The code is available at https://github.com/PeiChunChang/MS-SincResNet

classification, genre classification, representation, (14 more...)

arXiv.org Artificial Intelligence

2109.0891

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Italy (0.04)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Translating music to predict a musician's body movements

#artificialintelligenceJun-23-2018, 23:08:32 GMT

When pianists play a musical piece on a piano, their body reacts to the music. Their fingers strike piano keys to create music. They move their arms to play on different octaves. Violin players draw the bow with one hand across the strings and touch lightly or pluck the strings with the other hand's fingers. Faster bowing produces a faster music pace.

artificial intelligence, machine learning, video, (17 more...)

#artificialintelligence

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification

Dong, Mingwen

arXiv.org Artificial IntelligenceFeb-26-2018

Music genre classification is one example of content-based analysis of music signals. Traditionally, human-engineered features were used to automatize this task and 61% accuracy has been achieved in the 10-genre classification. However, it's still below the 70% accuracy that humans could achieve in the same task. Here, we propose a new method that combines knowledge of human perception study in music genre classification and the neurophysiology of the auditory system. The method works by training a simple convolutional neural network (CNN) to classify a short segment of the music signal. Then, the genre of a music is determined by splitting it into short segments and then combining CNN's predictions from all short segments. After training, this method achieves human-level (70%) accuracy and the filters learned in the CNN resemble the spectrotemporal receptive field (STRF) in the auditory system.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1802.09697

Country: Asia > Middle East > Israel (0.05)

Genre: Research Report (0.40)

Industry:

Media > Music (0.47)
Leisure & Entertainment (0.47)
Health & Medicine > Therapeutic Area > Neurology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

A Categorical Approach for Recognizing Emotional Effects of Music

Ardakani, Mohsen Sahraei, Arbabi, Ehsan

arXiv.org Machine LearningSep-17-2017

Recently, digital music libraries have been developed and can be plainly accessed. Latest research showed that current organization and retrieval of music tracks based on album information are inefficient. Moreover, they demonstrated that people use emotion tags for music tracks in order to search and retrieve them. In this paper, we discuss separability of a set of emotional labels, proposed in the categorical emotion expression, using Fisher's separation theorem. We determine a set of adjectives to tag music parts: happy, sad, relaxing, exciting, epic and thriller. Temporal, frequency and energy features have been extracted from the music parts. It could be seen that the maximum separability within the extracted features occurs between relaxing and epic music parts. Finally, we have trained a classifier using Support Vector Machines to automatically recognize and generate emotional labels for a music part. Accuracy for recognizing each label has been calculated; where the results show that epic music can be recognized more accurately (77.4%), comparing to the other types of music.

artificial intelligence, emotion, machine learning, (16 more...)

arXiv.org Machine Learning

1709.05684

Country:

North America > United States (0.28)
Asia > Middle East > Iran (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Add feedback

Gaussian Processes for Music Audio Modelling and Content Analysis

Alvarado, Pablo A., Stowell, Dan

arXiv.org Machine LearningJun-10-2016

Real music signals are highly variable, yet they have strong statistical structure. Prior information about the underlying physical mechanisms by which sounds are generated and rules by which complex sound structure is constructed (notes, chords, a complete musical score), can be naturally unified using Bayesian modelling techniques. Typically algorithms for Automatic Music Transcription independently carry out individual tasks such as multiple-F0 detection and beat tracking. The challenge remains to perform joint estimation of all parameters. We present a Bayesian approach for modelling music audio, and content analysis. The proposed methodology based on Gaussian processes seeks joint estimation of multiple music concepts by incorporating into the kernel prior information about non-stationary behaviour, dynamics, and rich spectral content present in the modelled music signal. We illustrate the benefits of this approach via two tasks: pitch estimation, and inferring missing segments in a polyphonic audio recording.

artificial intelligence, covariance function, machine learning, (17 more...)

arXiv.org Machine Learning

1606.01039

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Separation of Music Signals by Harmonic Structure Modeling

Zhang, Yun-gang, Zhang, Chang-shui

Neural Information Processing SystemsDec-31-2006

Separation of music signals is an interesting but difficult problem. It is helpful for many other music researches such as audio content analysis. In this paper, a new music signal separation method is proposed, which is based on harmonic structure modeling. The main idea of harmonic structure modeling is that the harmonic structure of a music signal is stable, so a music signal can be represented by a harmonic structure model. Accordingly, a corresponding separation algorithm is proposed. The main idea is to learn a harmonic structure model for each music signal in the mixture, and then separate signals by using these models to distinguish harmonic structures of different signals. Experimental results show that the algorithm can separate signals and obtain not only a very high Signalto-Noise Ratio (SNR) but also a rather good subjective audio quality.

algorithm, harmonic structure, music signal, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Finland > Pirkanmaa > Tampere (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Media > Music (0.87)
Leisure & Entertainment (0.87)

Technology:

Information Technology > Artificial Intelligence > Speech (0.49)
Information Technology > Data Science > Data Mining (0.35)

Add feedback

Separation of Music Signals by Harmonic Structure Modeling

Zhang, Yun-gang, Zhang, Chang-shui

Neural Information Processing SystemsDec-31-2006

algorithm, harmonic structure, music signal, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Finland > Pirkanmaa > Tampere (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Media > Music (0.87)
Leisure & Entertainment (0.87)

Technology:

Information Technology > Artificial Intelligence > Speech (0.49)
Information Technology > Data Science > Data Mining (0.35)

Add feedback

Separation of Music Signals by Harmonic Structure Modeling

Zhang, Yun-gang, Zhang, Chang-shui

Neural Information Processing SystemsDec-31-2006

Separation of music signals is an interesting but difficult problem. It is helpful for many other music researches such as audio content analysis. In this paper, a new music signal separation method is proposed, which is based on harmonic structure modeling. The main idea of harmonic structure modelingis that the harmonic structure of a music signal is stable, so a music signal can be represented by a harmonic structure model. Accordingly, acorresponding separation algorithm is proposed. The main idea is to learn a harmonic structure model for each music signal in the mixture, and then separate signals by using these models to distinguish harmonic structures of different signals. Experimental results show that the algorithm can separate signals and obtain not only a very high Signalto-Noise Ratio(SNR) but also a rather good subjective audio quality.

artificial intelligence, data mining, music signal, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Industry:

Media > Music (0.87)
Leisure & Entertainment (0.87)

Technology:

Information Technology > Data Science > Data Mining (0.35)
Information Technology > Artificial Intelligence > Speech (0.31)

Add feedback